helix-agent

Cut Claude Code token usage 82-97% with local LLMs.

The Problem

Claude Code's Max plan quota can vanish in 19 minutes. A single screenshot costs ~15,000 tokens; one DOM snapshot costs ~114,000. Retry loops burn tokens infinitely with no built-in detection -- the #1 pain point (666+ upvotes).

The Solution

helix-agent is an MCP server that compresses screenshots, DOM, and browser output through your local GPU before Claude sees them -- and detects retry loops before they drain your quota. Connect it to Claude Code and savings happen automatically; no workflow changes needed.

Measured Results

What	Without	With helix-agent	Reduction
Screenshot analysis	~15,000 tokens	~400 tokens	97%
DOM/HTML processing	~114,000 tokens	~500 tokens	99%
Browser automation	~15,000 tokens/action	~1,000-2,700	82-93%
Retry loops	Infinite (until quota dies)	Stopped at 3rd repeat	100%
Routine tasks	Opus tokens ($$$)	Local LLM ($0)	100%

All compression runs on your local GPU via Ollama. Zero cloud API cost.

Before / After

	Without helix-agent	With helix-agent
Screenshot	15,000 tokens raw image	400 tokens structured text
DOM snapshot	114,000 tokens raw HTML	500 tokens action summary
Retry loop	Runs until quota dies	Stopped at 3rd repeat
Routine task	Opus ($$$)	Local Ollama ($0)
Cloud API cost	$50-200/month in waste	$0

Quick Start

git clone https://github.com/tsunamayo7/helix-agent.git
cd helix-agent && uv sync
ollama pull gemma4:e2b          # 8GB GPU (or e4b/26b/31b for larger)
uv run python server.py

Add to ~/.claude/settings.json:

{
  "mcpServers": {
    "helix-agent": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/helix-agent", "python", "server.py"]
    }
  }
}

Restart Claude Code. Done.

$0 cloud cost. All compression, retry detection, and delegation runs on your local GPU via Ollama. No API keys, no subscriptions, no metered billing. Your tokens stay on your machine.

How It Works

Claude Code (Opus)
    |
    +-- helix-agent (MCP server)
           |
           +-- vision_compress ---- Local LLM ----> ~400 tokens  (was 15,000)
           +-- dom_compress ------- Local LLM ----> ~500 tokens  (was 114,000)
           +-- retry_guard -------- Pure logic ----> Loop stopped (sub-ms)
           +-- think / agent_task - Local LLM ----> $0 reasoning
           +-- computer_use ------- agent-browser -> 82-93% saved
           +-- code_review -------- 4-layer LLM --> $0.20 total

Works Everywhere

Platform	GPU	Status
macOS (Apple Silicon)	Metal / M1-M4	Tested daily
Linux	NVIDIA CUDA	Primary dev environment
Windows (WSL2)	NVIDIA CUDA	Supported via Ollama
Windows (native)	NVIDIA CUDA	Supported via Ollama
CPU-only	None	Works (slower, ~30s per compress)

Anywhere Ollama runs, helix-agent runs. 8GB VRAM minimum for GPU acceleration.

Features

Vision Compress -- Screenshot to structured text via local vision LLM. 15,000 tokens to 400.
DOM Compress -- HTML/DOM to structured extract via local LLM. 114,000 tokens to 500.
Retry Guard -- Detects identical tool calls before they loop. Sub-millisecond, no LLM needed.
GPU Auto-Detection -- Detects your GPU at startup, selects the optimal model from 8GB to 96GB+.

All 27 tools

Browser Automation -- Routes through agent-browser (Rust/CDP) with Playwright fallback. Native keyboard events fix React controlled components.
4-Layer Code Review -- gemma4 + Sonnet + Opus + Codex pipeline catches all issues at ~$0.20.
Self-Evolving Memory -- Reviews conversations every 5 turns, saves reusable skills as SKILL.md files. Gets smarter over time, all local.
Parallel Tasks -- Run multiple tasks simultaneously with 2-axis model routing (task type x input size).
ReAct Agents -- Local LLM delegation with tool access, sub-agents, background workers, and JSONL tracing.

Security: PathGuard

MCP tools that delegate to local LLMs can be tricked into accessing sensitive files. PathGuard prevents this with strict path allowlists -- delegated tools can only read/write directories you explicitly permit.

Defends against CVE-2025-59536 (RCE and API token exfiltration through Claude Code project files).

# PathGuard blocks unauthorized access automatically
HELIX_ALLOWED_PATHS=/home/user/projects,/tmp

Real-World Usage

helix-agent runs in production daily on the author's own Claude Code workflow:

367 tests passing (pytest, all Ollama calls mocked)
17+ hour autonomous sessions with retry guard preventing quota drain
27 MCP tools + 3 Resources + 3 Prompts -- full MCP spec coverage
Used to build helix-pilot, helix-codex, and itself (dogfooding)

GPU Auto-Detection

helix-agent auto-selects the best model for your hardware:

Your GPU	VRAM	Model	Compress Speed
RTX 4060	8GB	gemma4:e2b	10.2s
RTX 4070 Ti	16GB	gemma4:e4b	11.8s
RTX 4090 / 3090	24GB	gemma4:26b	14.7s
RTX PRO 6000	48GB+	gemma4:31b	27.5s

gemma4:e2b on 8GB runs 2.7x faster than 31b with comparable compression quality. No expensive GPU required.

Vision Pipeline

+--------------+     +-----------------+     +--------------+
| Screenshot   |---->| vision_compress |---->| ~400 tokens  |
| (15K tokens) |     | (local gemma4)  |     | (text only)  |
+--------------+     +-----------------+     +--------------+

+--------------+     +-----------------+     +--------------+
| DOM / HTML   |---->| dom_compress    |---->| ~500 tokens  |
| (114K tokens)|     | (local gemma4)  |     | (text only)  |
+--------------+     +-----------------+     +--------------+

Real measurement (RTX PRO 6000):

Input:  1920x1048 screenshot of X.com (~15,000 tokens)
Output: "X home feed, Japanese UI, 'For You' tab active..." (~400 tokens)
Saved:  7,362 tokens in one call

4-Layer Code Review

Automated multi-LLM review at ~$0.20 total:

Layer	Reviewer	Findings	Cost
1	gemma4 + RAG (local)	7	$0
2	Sonnet 4.7	14	~$0.13
3	Opus 4.7 (summary only)	16	~$0.03
4	Codex (P1 only, on-demand)	5	~$0.33
Combined		16+	~$0.20

gemma4 + RAG ($0) outperforms Codex GPT-5.3 (~$0.33) in code review findings.

What Nothing Else Does

Capability	helix-agent	Alternatives
Screenshot to text (97% cut)	Local vision LLM	No MCP server does this
DOM to text (99% cut)	Local LLM	Playwright MCP sends raw DOM
Retry loop detection	Sub-ms, no LLM	No built-in Claude Code detection
GPU auto-detect + model select	8GB to 96GB+	Manual config required
Self-evolving memory	SKILL.md + Qdrant	Unique to helix-agent
All 3 MCP primitives	27 Tools + 3 Resources + 3 Prompts	Most MCPs implement Tools only

MCP Architecture

27 tools organized by function:

Category	Tools
Token saving	`vision_compress`, `dom_compress`
Loop prevention	`retry_guard_check`, `retry_guard_status`, `retry_guard_reset`
Local delegation	`think`, `agent_task`, `fork_task`, `parallel_tasks`
Vision & browser	`see`, `browse`, `computer_use`
Background agents	`spawn_agent`, `send_agent_input`, `wait_agent`, `list_agents`, `close_agent`
Memory	`evolving_memory_review`, `list_learned_skills`, `get_skill`, `dept_search`, `dept_store`
Code quality	`code_review`
Meta	`providers`, `models`, `config`, `agent_types`

Plus 3 Resources (helix://status, helix://models, helix://config) and 3 Prompts (retry_report, optimize_tokens, setup_guide).

Configuration

helix-agent works with zero configuration. For advanced setups:

# Environment variables (all optional)
OLLAMA_HOST=http://localhost:11434   # Ollama endpoint
HELIX_PROVIDER=ollama               # LLM provider
HELIX_LOG_LEVEL=INFO                # Logging level

Optional dependencies:

Qdrant -- shared memory across sessions
Playwright -- browser automation fallback
agent-browser -- recommended for 82-93% browser token savings

Requirements

Python 3.12+
uv
Ollama + any Gemma 4 model:

GPU VRAM	Command	Model Size
8GB	`ollama pull gemma4:e2b`	4GB
16GB	`ollama pull gemma4:e4b`	6GB
24GB	`ollama pull gemma4:26b`	12GB
48GB+	`ollama pull gemma4:31b`	20GB

Related Projects

helix-pilot -- GUI automation MCP server
claude-code-codex-agents -- MCP bridge to Codex CLI
helix-sandbox -- Secure sandbox MCP server

Not a Claude Code Wrapper

helix-agent is an MCP server that Claude Code connects to. It does not wrap, proxy, or re-host Claude Code or the Anthropic API. Fully compliant with Anthropic's Terms of Service.

Contributing

See CONTRIBUTING.md.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
.github		.github
docs		docs
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
PLAN.md		PLAN.md
README.ja.md		README.ja.md
README.md		README.md
SECURITY.md		SECURITY.md
benchmark_results.json		benchmark_results.json
demo.gif		demo.gif
demo.tape		demo.tape
pyproject.toml		pyproject.toml
server.py		server.py
uv.lock		uv.lock
validate_path.py		validate_path.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

helix-agent

The Problem

The Solution

Measured Results

Before / After

Quick Start

How It Works

Works Everywhere

Features

Security: PathGuard

Real-World Usage

GPU Auto-Detection

Vision Pipeline

4-Layer Code Review

What Nothing Else Does

MCP Architecture

Configuration

Requirements

Related Projects

Not a Claude Code Wrapper

Contributing

License

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

helix-agent

The Problem

The Solution

Measured Results

Before / After

Quick Start

How It Works

Works Everywhere

Features

Security: PathGuard

Real-World Usage

GPU Auto-Detection

Vision Pipeline

4-Layer Code Review

What Nothing Else Does

MCP Architecture

Configuration

Requirements

Related Projects

Not a Claude Code Wrapper

Contributing

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages